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Abstract 

A broadcast mode may augment peer-to-peer overlay 
networks with an efficient, scalable data replication func- 
tion, but may also give rise to a virtual link layer in VPN- 
type solutions. We introduce a generic, simple broadcasting 
mechanism that operates in the prefix space of distributed 
hash tables without signaling. This paper concentrates 
on the performance analysis of the prefix flooding scheme. 
Starting from simple models of recursive k-ary trees, we an- 
alytically derive distributions of hop counts and the repli- 
cation load. Further on, extensive simulation results are 
presented based on an implementation within the OverSim 
framework. Comparisons are drawn to Scribe, taken as a 
general reference model for group communication accord- 
ing to the shared, rendezvous-point-centered distribution 
paradigm. The prefix flooding scheme thereby confirmed 
its widely predictable performance and consistently outper- 
formed Scribe in all metrics. Reverse path selection in over- 
lays is identified as a major cause of performance degrada- 
tion. 

Keywords: Prefix flooding, DHT, random recursive k- 
ary trees, overlay network simulation, Pastry, Scribe 

1. Introduction 

A broadcast service is commonly supported on the net- 
work and data link layer. Analog to the IP layer, application 
overlays may require the use of an unselective group com- 
munication. Distributed Hash Tables (DHT) like Chord [ 13 1 
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and Pastry [ 12 1 do not consider broadcast, i.e., a mechanism 
to communicate to all parties of one DHT instance without 
their active participation. 

The broadcast mode admits two unique features. The a 
priori awareness of the data flooding task may significantly 
enhance efficiency, e.g., by taking advantage of network or 
(shared) media specifics. Further on, it enables a message 
exchange among mutually unknown parties without a re- 
quirement of specific service awareness or any form of sig- 
naling. Broadcast is thus the fundamental mechanism for 
unselective data synchronization and for the autonomous 
coordination of distributed systems. 

On the application layer, there are likewise versatile use 
cases for broadcast communication. Applications range 
from broadband data dissemination in video conferencing 
or data replication, over service and peer discovery up to 
the implementation of a virtual link layer in VPN-type so- 
lutions. 

Broadcast is a special case of multicast. This distribution 
mechanism guarantees to reach not only a subset, but all 
nodes of a dedicated domain without explicit registration. 
The set of all nodes is also called the broadcast domain. It 
is worth noting that a broadcast domain can be arranged on 
different layers with varying inherent capabilities. Connect- 
ing nodes, e.g., with an Ethernet hub to a shared segment 
facilitates packet distribution based on the physical network 
structure. It is limited by the supporting medium, i.e., the 
range of signal propagation. The equivalent holds for the 
wireless domain, where the medium is always shared, but 
of restrictive propagation ranges. Participating nodes do 
not need a specific network logic in sending and receiv- 
ing broadcast data on the physical layer. Broadcast sup- 
port, however, on a dedicated layer should be independent 
of the underlying tier, which may accelerate it. In the ex- 
ample of IP, broadcast addresses will be directly mapped to 
the Ethernet broadcast address, such that all Ethernet hosts 



of one segment receive the data independent of their subnet 
assignment, but in contrast to network access, packets can 
be forwarded on the network layer beyond physical bounds. 

In general, broadcast in logical networks can be enabled 
by passing data incrementally to direct overlay neighbors. If 
the graph of nodes is connected and contains the source, all 
nodes will be reached. DHT structures allow to derive such 
a connected neighborhood graph. Any node can send pack- 
ets to an address adjacent to its own key space. In contrast 
to IP, every possible address is associated with one overlay 
peer. Such a simple ring broadcast scheme sends the packet 
to exactly one neighbor, reaching all n DHT peers after n 
hops. As an alternative approach to the case of unknown 
neighborhoods, a dedicated, well-known replicator can be 
placed in the network like the Broadcast and Unknown 
Server in ATM. Such a rendezvous point-based approach re- 
quires extra signaling to register receivers. The parallelism 
of distribution is bounded by the replicator, which sustains 
the overall duplication load and may be a single point of 
failure. 

In the following, we will present a general broadcast al- 
gorithm along with optimizations for Pastry, that uses the 
DHT structure more efficiently and replicates data stepwise 
to all neighbors in prefix space. This scheme works without 
peer involvement, especially without signaling. We model 
and analyze the approach theoretically and in simulation, 
drawing comparison to a generic rendezvous point approach 
derived from Scribe 0. 

This paper at first gives an introduction of the prefix 
flooding algorithm in the next section and continues as fol- 
lows. Section [3] presents an overview of the performance 
measures applied in our analysis, while analytical models 
are utilized in section |4] to derive distributions for the core 
properties of replication load and hop count. Results of our 
simulation studies are outlined in the subsequent section [5] 
Related work is reviewed in section [6] followed by a final 
discussion and conclusion in section|7] 

2. Broadcast by Prefix Flooding 

For an efficient application layer broadcast we need to 
define a strategy for data replication on the overlay. In 
a DHT, the peer identifiers are composed using an alpha- 
bet of k digits and have a predefined length. All nodes of 
a structured overlay can be naturally arranged in a prefix 
tree, branching recursively at longest common prefix of k 
neighboring vertices. The leaves are labeled with the over- 
lay identifiers of the DHT members and the inner vertices 
represent the shared prefix (cf. figure[T]i. 

This tree can be interpreted as a distribution tree, defin- 
ing the broadcast domain of a specific DHT instance. If a 
broadcast packet is sent starting from the root of the tree 
towards the leaves, the packet will be replicated where pre- 




Figure 1. DHT Node within a Prefix Tree - 
Associated Vertices are Highlighted. 



fixes branch. Actually, the broadcast domain (prefix tree) 
decomposes in many smaller broadcast sub-domains (sub- 
trees), in which the propagations continue in parallel. Fol- 
lowing the nature of broadcast, a packet will be forwarded 
locally, after it has arrived at a root of a subtree. 

This approach allows to reach all peers of a DHT, be- 
cause the data is flooded to the leaves, which represent the 
overlay nodes. A peer receiving a broadcast is required to 
determine the current branching position on the distribution 
tree to decide on further packet replication. This context 
awareness can be achieved by sending broadcast packets 
carrying the prefix currently addressed, which we call des- 
tination prefix. This destination prefix will grow in length 
with every forwarding hop while descending the tree. 

We denote the length of a prefix A by |.4|. Given two 
prefixes A and B, the longest common prefix will be written 
C = LCP(A, B). The relation of C being a prefix of A is 
written as C C A. Consequently C C A and A C £ if and 
only if C = A. 

A proper specification for data distribution, i.e., a routing 
procedure on prefix trees, requires further definitions. The 
two sub-problems that need to be solved are a routing to a 
prefix and the association of nodes with prefixes: 

Definition 1 A prefix C is associated with an overlay node 
of ID Af, if and only if £ C TV. 

As shown in figure [T] all inner vertices on the shortest path 
from the root to a node are associated with that node. 

Concordantly, a prefix routing can be defined as forward- 
ing a packet to the node the destination prefix is associated 
with. In general, there may be several nodes owning an 
associated prefix, since prefix-to-node mapping is only as- 
sured to be unique for prefixes of full key length. For flood- 
ing a prefix tree, a forwarding peer needs to route packets to 
all 'live' neighboring prefixes (cf. figure [TJ. Consequently, 
a peer must store corresponding nodes for each prefix adja- 
cent to its associated vertices in a prefix neighbor set. It is 
important that these tables are complete. A complete neigh- 
bor set meets the following condition: Whenever an over- 



lay node exists for a given prefix, then the neighbor set will 
provide an entry for this prefix. In particular it follows that 
each overlay node is a destination in at least one set, since 
node keys are uniquely assigned. It is worth noting that a 
prefix needs not to be included in any neighbor set, if there 
is no peer sharing it. The requirement of complete neigh- 
bor tables will usually be fulfilled by the key-based routing 
service, i.e., underlying DHT routing maintenance. 

A source initiates a broadcast by starting with the empty 
destination prefix. This corresponds to delivering the data to 
all prefix neighbors A/J ■ At each neighbor a packet will be 
further replicated. The destination prefix is replaced with 
the new target address. In detail, the algorithm works as 
follows: 

Prefix Flooding 

> On arrival of a packet with destination prefix C 

> at a DHT node 

1 for all Mi IDs in prefix neighbor set 

2 do if (LCP(C,Afi) =C) >Afi downtree neighbor 

3 then C new <- N t 

4 Forward packet to C new 

If an inner vertex of the prefix tree fails, e.g., due to 
churn, the corresponding sub-tree is empty or includes fur- 
ther peers. The replacement of the next hop for a given pre- 
fix Cnew will be achieved by the underlying DHT. In gen- 
eral, in the case of overlay network failures the reliability of 
prefix flooding relies directly on the deployed DHT mainte- 
nance. 

If all peers have a complete set of prefix neighbors, the 
scheme guarantees that all overlay nodes will be accessed, 
no peer receives a broadcast packet more than once and the 
algorithm terminates. 

Theorem 1 (Coverage) If the prefix neighbor sets are com- 
plete at all nodes, then the PREFIX FLOODING assures 
packet distribution to all overlay nodes. 

Theorem 2 (Uniqueness) Each overlay node will receive 
a broadcast packet at most once using the PREFIX FLOOD- 
ING. 

Complete proofs for both theorems are elaborated in 
|fl5ll . Theorem [T] can be proven by induction over the num- 
ber of overlay nodes, while theorem [2] follows from the 
observation that each routing prefix uniquely identifies the 
root of a subtree in prefix space. 

From theorem [2] it can be concluded that the PREFIX 
Flooding does not induce loops, proving the assumption 
that the algorithm terminates. 



2.1. Implementation for Pastry 

The idea of prefix routing is implemented in Pastry. The 
Pastry routing table of a peer reflects directly the elements 
of a prefix tree. Thus each peer carries a subset of the prefix 
tree in its routing table. Merging the routing tables of all 
peers, would form the global distribution tree. In flooding 
their routing tables, Pastry peers flood the prefix tree, which 
corresponds to the overlay broadcast described by the Pre- 
fix Flooding. In detail, the idea is as follows: A source 
sends its data to all routing table entries. Each destination 
prefix corresponds to the root of a broadcast sub-domain. 
The receiving peers determine their position in the tree, i.e., 
the height D in the prefix tree, at which they receive the 
data, and forward the packets downwards. This is equal to 
sending data to all routing table entries starting at row D+ 1. 
Note that the tree position can easily be derived by denoting 
the row number, which reduces the packet size in contrast to 
encoding the entire key. For Pastry the Prefix Flooding 
reads in pseudo code: 

Pastry Prefix Flooding 

D> On arrival of a packet with destination prefix length 
D> D at Pastry node of ID JC with routing table A 
[> containing I rows and k columns 

1 for alH <- D + 1 to I 

2 do for all j <— 1 to k 

3 do if djj 7^ Unspecified A a«j =^ K 

4 then D new i 

5 Forward Packet To a i:j 

If the routing table is filled correctly, all theorems for the 
Prefix Flooding are also valid for Pastry, since the Pas- 
try routing table corresponds to the set of prefix neighbors 
{Afi}. However, Pastry reactive maintenance does not guar- 
antee that each overlay node will provide complete routing 
states [12], which conflicts with the Prefix Flooding. 
Therefore we augmented Pastry with a proactive routing 
maintenance mechanism, which performs initial key look- 
ups to fill the routing table similar to the "fixJingers" rou- 
tine in Chord. 

3. Performance Measures 

The prefix flooding approach to broadcasting introduces 
prefix trees as a control plane to packet forwarding. This 
simple mechanism operates without additional signaling, 
which is an apparent advantage. The quality of the routing 
as inherited from a hash-generated prefix tree needs closer 
inspection. Ideally, packet distribution should be fast and 
minimize traffic and replication load in the network. To ob- 
tain an overall insight into the routing quality, we evaluate 
the prefix flooding scheme in theory and in a discrete event 



simulation according to the following metrics and compare 
our results to Scribe Q. Scribe serves as a generic refer- 
ence model for schemes using dedicated replicators, and is 
based on the same DHT, Pastry. It is worth noting that the 
performance metrics do not measure the multicast specific 
properties of Scribe. Thus, choosing Scribe for comparison 
is reasonable. 

Packet replication load quantifies the number of pack- 
ets a single peer has to forward. This metric reflects the 
number of direct neighbors per node in the distribution tree. 
The overall characteristic for the prefix routing is then given 
by the distribution of the replication load obtained from all 
forwarding nodes. 

Hop count counts the number of overlay routing traver- 
sals that a packet needs on its way from the source to the 
destination. Note that the hop count affects the travel time, 
because every additional hop results directly in an addi- 
tional transmission time. In this sense the travel time is 
correlated with the hop count. 

Travel time describes the time a data packet travels from 
the source until it reaches a receiver measured in seconds. 
This absolute value depends on the one hand on the number 
on hops between the nodes and on the other hand of the 
transmission time inherited from the hop by hop link delays 
and the packet size of the transmitted data. 

Relative delay penalty measures the ratio of the travel 
time for data packets delivered via Scribe and the travel 
time resulting from the prefix flooding scheme. This rel- 
ative factor gives an indication of the parallelism of packet 
forwarding. 

4. Analytical Models 

To understand the performance of the prefix flooding 
scheme, we first present analytical considerations. Based 
on the shape of the prefix tree, we gain insight in the struc- 
tural behavior of protocols for traversing prefix distribution 
trees. As this analysis is only based on the tree itself, fringe 
effects known from simulations are isolated. 

4.1. Replication Load 

In the following, we want to derive the distribution of 
the replication load in a prefix tree. For the general case of 
prefix flooding in a structured overlay of N nodes using a 
prefix alphabet of fc digits, the following upper bound of the 
replication load can be derived immediately. 

Theorem 3 Any overlay node in a prefix flooding domain of 
N receivers and an alphabet with k > 2 digits will replicate 
a data packet at most log 2 (-ZV)(fc — 1) times. 

For the distribution function of the replication load in a 
fully populated prefix tree, we need to determine replication 




Figure 2. Self-Similarity of Prefix Subtrees 
due to the Recursive Nature of fc-ary Trees 

values along with their frequencies. Recalling the picture of 
a full prefix tree for an alphabet with k digits, every node ex- 
cept the leaves has k children. The number of packet repli- 
cations for an overlay peer is equal to the overall number of 
forwarding neighbors, which depends on the tree position, 
where a peer receives the packet. Per level the replication 
load is fc — 1. Consequently, in a fully populated fc-ary pre- 
fix tree of height h, replication occurs only at multiples of 
k — 1, the number of neighbors in prefix space. For j > 
we denote these discrete values by Vh,k{j) = (h—j)(k—l). 

To derive the replication frequency, we quantify the oc- 
currence of the replication load Vh,k(j)- Since we know the 
load of a peer forwarding packets at height j, the frequency 
can be calculated by counting the number of peers that ful- 
fill the replication condition. The latter corresponds to the 
number of (sub-)trees with height h — j, because every peer 
serves as forwarder for one tree. Starting at the source in a 
full prefix tree, the structure decomposes in fc — 1 subtrees 
with height h — 1, fc(fc — 1) subtrees of height h — 2, etc. 
(cf. figure |2j. At every level of the full prefix tree, there is 
an exponential growth in the number of inner vertices rep- 
resenting the root of new subtrees. Thus, the frequency of 
(h — j)-size subtrees must increase exponentially with their 
decreasing height. In detail there are fc j l • (fc — 1) sub- 
trees of height h — j, which account for a replication load 
of(h-j)-(k-l). 

Theorem 4 Given a fully populated k-ary prefix tree of 
height h. Then the frequency fh,k( v h,k{j)) f or a replica- 
tion load Vh,k(j) = (h — j)(k — 1) is given by 

, f f .\\ f 1 for j = , 

/m(«mU)) - \ feJ-i . (fc - 1) for < j < h. (1) 

Proof by induction. We assume a full fc-ary prefix tree of 
height h. The case j = corresponds to the (single) source 
that replicates data to h(k — 1) neighbors as derived above. 

The induction is done with respect to h — j, the height of 
a subtree (cf. fig. [2]). 

Base case: Is h — j = 1, we have to show that the repli- 
cation load Vh,k(h — 1) appears (fc — l)-times. In a tree 



of height 1, the source sends the data to all further leaves 
directly, which equals k — 1. 

Induction step: Assume the statement holds for h — j. 
We have to show that the statement holds for h — j + 1, i.e., 
fh,k{v h ,k(h-j + l)) = k h -l(k-l). 

Consider a full prefix tree of height h — j + 1. It consists 
of k subtrees of height h — j. The replication load of a node 
in a tree of (h — j + 1) equals the sum of all neighbors in 
these k subtrees. Using the induction hypothesis the overall 
replication load reads 

k-f h ,k(v h>k (h-j)) = kk^-^k-l) = k h -i(k-l). □ 

The overall number of packet replications is easily iden- 
tified as the number of leave nodes, since there are no packet 
duplications and each peer receives the broadcast. The num- 
ber of leaves of a full fc-ary tree of height h equals k , such 
that we arrive at the following 

Corollary 1 The probability distribution Ph,k for packet 
replication multiplicities reads 

!k- h for j = 

ti-*- 1 ■ (k - 1) for 1 < j < h 
otherwise. 

(2) 

Corollary 2 The average replication load for a node in a 
full prefix tree Th t k is given by 1 + 0(k~ h ), its standard 
deviation by \fk + 0{k~ h ). 

Observing the weak dependence of the replication load 
distribution on h and fc, i.e., the tree shaping parameters, 
it can be assumed that the model is sufficiently general to 
grant insights into the qualitative replication behavior of a 
sparsely populated fc-ary trees. We will see in section[5]that 
the simulations support this assumption. 

4.2. Hop Count 

As for the replication load, we firstly derive general mea- 
sures of the number of hops a packet travels from the source 
to any destination in the prefix flooding scheme. 

Theorem 5 Any overlay node in a structured broadcast do- 
main of N receivers and an alphabet with k > 2 digits will 
receive a packet from prefix flooding after at most log 2 (N) 
hops. In the presence of Pastry overlay routing, the number 
of hops attained on average equals log 2 b(N) with k — 2 b . 

We now want to return to considering a fully populated 
prefix tree and derive the hop distribution thereof. The main 
idea is similar to the replication load: A forwarding peer 
sends the broadcast to k — 1 prefix neighbors, all of them 
rooting an equally structured subtree of height h—1. We are 



counting the number of paths with a length reduced by one 
herein. Additionally we count the frequency of paths for 
the calculated hop count in the virtual subtree containing 
the forwarder. This recursion results in 

Theorem 6 Given a fully populated k-ary prefix tree of 
height h, the frequency fh,k(j) of a hop count j occurring 
in prefix flooding is given by 

f h , k (j) = ( h )(k-iy. o) 



Proof. A flooding packet arriving at node n after j hops will 
admit a current destination prefix of length j. Being located 
in a subtree of height h — j, n will forward the packet to 
its downtree neighbors, thereby partitioning its subtree into 
k—1 further subtrees of height h—j — 1 (cf. figure[2]i. Due 
to the recursive nature of the /c-ary prefix tree, the frequency 
distribution satisfies the recurrence relation 

fh.kU) = h-iAj) + (*-!)• fh-i,k(j - 1) (4) 

with initial conditions /i fc(0) = I, fx fe(l) = k — 1. 
Inserting //, j. yields the claim. □ 

This result can be interpreted in two different ways. 
Among all legitimate paths in downtree routing, i.e., of 
length h, those of length j are selected and branch k — 1 
times at each of the j intermediate prefix nodes. Alterna- 
tively, flooding corresponds to a node discovery process, 
where a node discovers its Vh,k(j) = (h — j)(k — 1) neigh- 
bors which in turn discover their neighbors in the following 
step. Subsequent neighbor discovery requires connect to 
the j-th part as only (h — j)(k — l)/j nodes have further 
neighbors. 

Following a similar argument as in corollary[TJ it is clear 
that normalization for hop count frequencies is given by k , 
the number of leaf nodes in the full prefix tree. 

Corollary 3 The probability distribution H^^U) of the 
hop count for flooding a full prefix tree T^.k evaluates to 

H h>k (j) = k- h -( h )(k-iy. (5) 



Corollary 4 The average hop count at which a packet is 
received from flooding in a full prefix tree T^ j- is given by 
< Hh,k >= (fc — 1)1 k ■ h, the standard deviation of the hop 
count distribution (p| equals o~H h k = y/ (k — 1) • h/k. 



This average is almost independent of the prefix alphabet 
k and can be in some sense interpreted as the counterpart 
of the average replication load as seen in corollary [2] As 
the average number of per hop replications is close to one, 
packets travel down the entire tree and reach most of their 
receivers after nearly h hops. The width of the hop count 
distribution, its standard deviation, admits a weak depen- 
dence on fe, slowly decaying from its maximum at fe — 2 as 
fe" 1 / 2 . 

In contrast to the replication load distribution, which 
showed only a weak dependence on the tree shaping pa- 
rameters, the hop count results strongly depend on h for the 
fully populated fc-ary tree. The height h is directly related to 
the number of nodes k h in this tree, which does not hold for 
realistic scenarios. Thus a direct transfer to sparsely popu- 
lated random trees is questionable. 

To derive a distribution for general distribution trees, 
evaluations are required on the class of all random k-ary 
trees. Unfortunately, this turns out to be difficult. Pro- 
ceeding in a significantly simpler, but reasonable approach, 
we restrict the analysis to the class of random recursive k- 
ary trees with a homogeneous probability p for independent 
edges. In this model, each vertex branches to each of its 
k — 1 possible outdegrees independently with probability p, 
thereby preserving the recursive nature of the fully popu- 
lated k-ary tree. Instead of equation|4j the hop frequency of 
routing on this random recursive tree will be governed by 
the modified rate equation 

fh,kW = h-i,k{3)+P-(k-l)- h-xMii-l) (6) 
with/ 1 , fe (0) = l,/ 1>fe (l)=Kfc-l)- 

This can be solved analogously to [4] and yields 

Corollary 5 The probability distribution H^\{j) of the 

hop count for flooding a random recursive k-ary prefix tree 

(v) 

with homogeneous, independent edge probability p 
evaluates to 

Hi%) = (i + P (k-i))- h -Q-(p(k-i)y, (?) 

which attains the average value < H^ k > — jjpfejU ■ h, 
and the standard deviation a H ( P ) = Y+p^fc-i) "- 

The introduced edge probability p is not a 'free' pa- 
rameter, but a function of the number of leaf nodes N = 
(1 + p(k — l)) h in the tree. Solving this relation for 

p = v fc _ 7 , and inserting typical Pastry parameters for 
k = 16, h = 128 and node numbers of our simulations, 
will lead to the relatively small edge probabilities, mean hop 
counts and standard deviations displayed in table [T] 





fe = 16, h = 128 


N 


10 


100 


1.000 


10.000 




0.00122 


0.00244 


0.00370 


0.00497 


<h£1> 


2.30 


4.52 


6.73 


8.88 




1.50 


2.09 


2.53 


2.87 



Table 1. Selected Link Probabilities, Mean 
Hop Counts and Standard Deviations. 



These analytical results will not only support a qualita- 
tive insight into the mechanisms of prefix-based packet dis- 
tribution, but will also show significant agreement with the 
simulation results presented in the subsequent section. 



5. Simulation Results 



In this section, we will analyze the performance of the 
prefix flooding based on a stochastic discrete event simula- 
tion and compare to the behavior of the rendezvous point- 
based approach Scribe. Both, the prefix flooding and Scribe, 
are implemented on top of a proactive version of the DHT 
substrate Pastry. 

In detail, our simulations are performed on the network 
simulator platform OMNeT++ 3.3 1141 . supplemented by 
a preliminary version of the overlay simulation package 
OverSim [2 1 including Scribe and extended by the prefix 
flooding implementation. Pastry has been configured as in 
its original version [12|. Especially, we use a key length 
of 128 and an alphabet size of 16, if not mentioned other- 
wise. To investigate the scaling behavior of the protocols, 
the simulations are conducted for a number of peers varying 
by three orders of magnitude. None of the relative metrics 
described in section [3] depend on the underlay. Thus the 
Simple model [ 1 1 has been applied as the underlying net- 
work with a homogeneous link delay of 1 ms to analyze the 
network properties inside the overlay. 

The analysis is not focusing on reliability aspects, which 
allows us to neglect churn. In particular, any effects of 
volatile nodes would be completely maintained by Pastry 
for the prefix flooding and partially for Scribe. Rendezvous 
point (RP) based schemes have to reorganize the distribu- 
tion tree due to failing RPs, resulting in DHTs by new key 
associations, which nevertheless is not addressed here. 

Summarizing the simulation scenario, we calculate the 
flooding performance on an arbitrary (k — 16)-ary pre- 
fix tree with a fixed maximal height and a varying number 
of leaves interconnected by links of identical weight. The 
broadcast will be initiated by a randomly selected leaf. 




Replication Load [Packets] Replication Load [Packets] 

(a) Prefix Flooding, k = 16 (b) Scribe, k = 16 
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(c) Detail: Tail for Prefix Flooding, k = 16 (d) Detail: Tail for Scribe, k = 16 

Figure 3. Distribution of Packet Replication Comparing Prefix Flooding with Scribe 
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Figure 4. Hop Count Distribution for an Overlay of Size .V 



5.1. Replication Load 



5.2. Hop Count 



The distributions of the peer replication load for prefix 
flooding and Scribe are displayed in figure[3] Both schemes 
show an exponential decay around their common average 
value of 1 . However, the shapes of the distributions for the 
two approaches vary significantly, which becomes apparent 
at first from standard deviation values. While the widths 
of the distributions for prefix flooding are small und almost 
independent of network sizes, the corresponding values for 
Scribe grow large, about linearly in the number of nodes. 

Both broadcasting schemes produce a large number of 
replications of values and 1, but frequencies drastically 
drop for higher multiplicities. Prefix flooding distribution 
attains a much smoother decay, leaving significant proba- 
bility to replication values of 2 — 10. Smoothness is even 
more pronounced for smaller alphabets, which for space re- 
strictions are omitted here. In contrast, Scribe decreases 
faster from its average, decaying rapidly to probabilities be- 
low 1/100 for replications larger than 2, fairly independent 
of the alphabet k. 

An exception from this overall shape can be observed 
for the distribution of 10 peers in Scribe. Here, the frequen- 
cies of replication values around 9 are strongly enhanced. 
This border effect for very small networks can be under- 
stood from analyzing distribution tails. As visualized in 



the log-log plot 3(d) the distribution of Scribe is heavy- 
tailed according to a power law decay, representing remark- 
ably high probabilities for very large replication values up 
to 7800. Corresponding probabilities are accumulated for 
small sized overlays. 

In contrast, the prefix flooding distribution admits a strict 
exponential decay, with tail weights vanishing above 50. 
Replication values in prefix flooding are superimposed by 



oscillating frequencies as visible in figure 3(c) The result- 
ing probability "bumps" are noticeable on different scales 
for all overlays and can be explained by our theoretical anal- 
ysis, which reveals an exponential decay within the range 
of multiples of (fc — 1). Compared to the prerequisites of 
corollary [T] the simulated overlays do not operate on full 
fc-ary prefix trees. Hence replication values do not only oc- 
cur as multiples of the branching factor, but level out with 
neighboring values. Nevertheless, regarding the peaks of 
the bumps, the population and replication pattern of the fc- 
ary trees remain clearly visible. 

In both approaches, most of the peers receive the broad- 
cast without a need to forward it further. Scribe thereby 
stresses a small number of peers to serve a much higher 
replication load. Instead, the prefix flooding reduces the 
maximal replication load by distributing the load evenly 
over the neighbors. 



The mean hop count distribution for different overlay 
sizes is shown in figure [4] In general, both schemes show 
the logarithmically growing hop path length dependent on 
the number of peers. With an increasing quantity of leaves, 
the height of prefix trees will increase logarithmically, as 
well, resulting in longer paths from the source and inter- 
mediate forwarders to the receivers. The mean hop count 
< X > for Scribe highlights approximately one additional 
node in contrast to the prefix flooding. 

For a sufficiently large N > 10, the average of the distri- 
bution for the prefix flooding attains directly the calculated 
mean hop count in theorem [5] at which all other hop count 
values are centered. The hop count distribution in Scribe 
shows a heavy-tailed behavior, which increases with the 
overlay size as indicated by the approximate linear growth 
of the standard deviation. In contrast, the prefix flooding 
almost attains a constant variation. Consequently, in pre- 
fix flooding the path lengths are tightly concentrated around 
the logarithmically bounded average, while Scribe builds up 
longer branches with higher weights. 

5.3. Relative Delay Penalty 

Figure|5]shows the relative delay penalty (RDP) as func- 
tion of the network size for Scribe over prefix flooding. 
Scribe packets travel about a factor of 1.4 slower than data 
of prefix flooding in larger networks. The enhanced delay 
penalty in small networks of about 10 peers reflects the ob- 



servations of figure 3(b) that almost all receivers are ad- 
dressed directly by the rendezvous point, which replicates 
the full number of overlay nodes. The more keys are al- 
located, the more branching points are located close to the 
RP resulting in longer paths and less efficient parallelism in 
Scribe, which is in contrast to the prefix flooding. 

6. Related Work 

The principal approach for implementing broadcast on 
a pure DHT derives from recursive partitioning of the key 
space with data distribution following partition ranges. The 
prefix flooding operates in this sense, defining numerical in- 
terval boundaries from prefix transitions. The first idea of 
a broadcast based on nested intervals was proposed in |5|. 
The broadcast is sent to intervals of exponentially increas- 
ing scale as derived from the Chord routing table. 

A generalization of is proposed in (6). In addition to 
a design independent of Chord, the authors enhance their al- 
gorithm by reliability routines, which guarantee a broadcast 
distribution independent of the routing table states. This is 
performed by delegating data delivery for missing entries to 
subsequent forwarders. 
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simulation study of flooding and tree based overlay multi- 
cast over CAN and Pastry with respect to the underlay is 
presented in flU. The authors show that CAN flooding is 
outperformed by Pastry flooding, which relies on a more 
efficient tree structure adaptive to the underlay. 

Our implementation of the generalized prefix flooding is 
similar to the Pastry flooding of Castro et al. [4|. The main 
difference lies in the reactive routing maintenance, which 
may result in data redundancy at the fallback forwarder |4|. 
The focus of their analysis of broadcast distribution lies in 
the context of overlay multicast. Results are only based 
on simulations. The measured metrics reflect performance 
issues focusing on efforts imposed on the underlying net- 
work. In this sense, our work can be understood as comple- 
mentary: We presented a general prefix flooding and inves- 
tigate its inherent, structural properties using an analytical 
model and simulations. 



The authors in (SI introduce a scheme, which splits the 
key space in d partitions of equal size and selects the first 
node in clockwise direction as the responsible forwarder. 
Otherwise similar to [5], this approach refrains from using 
uneven, logarithmic partitioning. 

An approach, which cannot ensure a broadcast distribu- 
tion without data redundancy, is presented in 0. The au- 
thors combine a slightly enhanced version of |5| with an 
epidemic distribution. All broadcast forwarders send the 
data periodically to a randomly chosen neighbor, whereby 
the protocol may duplicate broadcast to the same neighbor. 
All of the approaches mentioned above lack formal verifi- 
cation, as well as analytical considerations regarding data 
distribution in fc-ary prefix trees. Most of the algorithms 
are implemented on top of Chord, none of them on Pastry, 
which natively offers a proximity-aware prefix routing. 

A generalized construction scheme to partitioning the 
key is space is presented in [7 1. The authors observe that any 
contractive self-mapping function P of the key space with 
a single fixed point a, i.e., P(a) = a, gives rise to a parent 
relationship. Based on the parent relation P(a), a reverse 
path can be set up for any node a, leading to a broadcast 
distribution tree with the root a. Different parent functions 
thus give rise to different trees at variable roots, which may 
be used for load-sharing or redundancy purposes. 

DHT specific flooding has been introduced in the early 
work [11 1 for CAN (Content Addressable Network). In 
contrast to Chord or Pastry, CAN maps node IDs to regions 
representing coordinates in a partitioned ti-dimensional 
space. CAN broadcasts the data to all geographical neigh- 
bors, thereby accounting for predecessors and foresee- 
able redundancies. However, the partitioning of the d- 
dimensional space may be uneven and result in data du- 
plication at sub-regions. Performance properties of multi- 
cast on CAN are derived analytically in [?]. An extensive 



7. Discussion and Conclusions 

In this work, we have presented and analyzed broadcast- 
ing within distributed hash tables. A general prefix flooding 
approach, distributing data along prefix branches directly to 
receivers, is compared to a rendezvous point-based scheme 
which utilizes a shared tree rooted at a predefined anchor 
peer. Several phenomena of general interest could be ob- 
served. 

Divergent Path Length Distributions: Our simulation 
results confirm the mean hop difference of one between the 
prefix flooding and the rendezvous point-based approach 
Scribe. This additional, triangular hop in the overlay be- 
comes noteworthy when stretched in the underlay and then 
may put stress on several links. The major advantage of the 
prefix flooding, though, is its quite stable concentration of 
path length distribution around the average, attaining low 
variations independent of the overlay size. In general, P2P 
networks consist of volatile nodes. If we assume an over- 
lay with regular churn, i.e., session times in the range of 
minutes or larger, and a persistent number of peers on av- 
erage, the DHT moderately reorganizes key associations. 
Such structural modifications lead to changing paths within 
the overlay and in the worst case, a single arrival or depar- 
ture of a node may cause a data path to change drastically. 
In the prefix flooding, the path length only changes moder- 
ately for new and existing peers due to its narrow distribu- 
tion. In contrast, the heavy-tailed overlay hop count distri- 
bution of Scribe produces a largely inhomogeneous travel 
time, which complicates synchronous applications. 

Varying Replication Load: A high variation can also be 
identified for the packet replication in Scribe. Similar to the 
prefix flooding, it is rather likely that peers forward with 
low replication load. Nevertheless, in a long tail distribu- 
tion nodes are required to replicate many more packets with 



values up to 7.800 in large sized overlays of 10.000 peers. 
The distribution of packet replication is thus strongly unbal- 
anced, requiring very low and very high values to be served 
within the same scenario. Such behavior does not only de- 
grade the performance, but may threaten stability and even 
cause conflicts with intrusion detection systems. 

In contrast to Scribe, the prefix flooding guarantees 
a replication load closely balanced around its average of 
about 1. It can be tuned directly by the branching factor 
k. As we know from the theoretical analysis of section [4] 
packet replications occur as multiples of k — 1 in full pre- 
fix space. Decreasing k adjusts the maximum number of 
replications to smaller values. 

An Overloaded Single Peer: The peers with extraordi- 
narily high packet replication load in Scribe have been iden- 
tified as the rendezvous points (RP). An appropriate treat- 
ment of such service nodes becomes more important under 
the aspect of unbalanced packet replication, but poses a se- 
vere conceptual problem in DHTs: The placement of this 
entity should account for node and network capacities, but 
in a DHT is bound to the structural mapping of the multi- 
cast group identifier to an overlay key. Any alternative ap- 
proach, e.g., selecting the RP address independently of the 
group address, will break the key space semantic with the 
result that an overlay node cannot derive the RP distribution 
address automatically. 

Our prefix-guided broadcast strictly adheres to forward- 
directed establishment of distribution trees. We have shown 
the generation of efficient group communication structures. 
The presented approach is thus particularly promising for 
overlay multicast services. Having sketched a structured 
multicast solution operating in prefix-space 1 16 1, its elabo- 
ration is subject to our currently ongoing work. Further on, 
we will integrate our scheme in hybrid group communica- 
tion architectures [?]. 
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